14. SVM Classifier
Learning from Feature Vectors
So far, we know how to create a HOG feature vector for a small image. The next step is using this data to train a model and classify objects!
To do this, we'll be using a Support Vector Machine (SVM), which you can review, here. SVM's work to best separate labeled data into groups and train until they reach an acceptably low error rate.
In the case of images and HOG feature classification, the SVM will train on sets of labeled images that also have associated HOG feature vectors. It will learn the association and try to classify new images by looking at their HOG feature vector.
Using feature vectors is a lot faster than looking at large image files in their entirety!
SVM's in OpenCV
Note: At this point, this course does not provide a large dataset, but you can find many to work with on Kaggle or through ImageNet to name a few.
To create an SVM in OpenC, we define it's parameters and call a constructor.
# Define the SVM training parameters
svm_params = dict( kernel_type = cv2.SVM_LINEAR,
svm_type = cv2.SVM_C_SVC,
C=2.67, gamma=5.383 )
# Initialize the SVM
svm = cv2.SVM()
Training
Next, you'll need to prepare any training data you have, associating images with their labels and computed HOG feature vectors. So, you should have as many sets of feature vectors for as many labels as you want to detect; for example, you'll need at least two for recognizing banana and not-banana images.
````python
Read in sets of images and their labels
all_images = glob.glob('.jpeg') labels = glob.glob(.txt')
Form your HOG training data
hog_data = [map(hog, labels) for image in all_images]
training_data = np.float32(hog_data)
Train and save your SVM
svm.train(trainData, labels, params=svm_params)
svm.save('svm_model.dat')
```
Testing
Finally, you'll need to test your model to verify it's classification accuracy.
# After reading in the test data and determining the HOG feature vectors
test_data = np.float32(test_hog_data)
# Test the SVM model
result = svm.predict_all(test_data)
# Check the accuracy
mask = result==labels
correct = np.count_nonzero(mask)
print (correct*100.0/result.size)